Skip to content

Conversation

@ilayaperumalg
Copy link
Member

  • The TokenTextSplitter was incorrectly splitting small text into multiple chunks when punctuation marks were present, even when the entire text was well below the configured chunk size.

  • Only apply punctuation-based truncation when the remaining tokens exceed the chunk size (tokens.size() > chunkSize). This ensures small texts remain as single chunks while preserving correct splitting behavior for larger texts.

Fixes #4981

 - The TokenTextSplitter was incorrectly splitting small text into multiple
   chunks when punctuation marks were present, even when the entire text was
   well below the configured chunk size.

 - Only apply punctuation-based truncation when the remaining tokens
   exceed the chunk size (tokens.size() > chunkSize). This ensures small texts
   remain as single chunks while preserving correct splitting behavior for
   larger texts.

Fixes spring-projects#4981
Signed-off-by: Ilayaperumal Gopinathan <ilayaperumal.gopinathan@broadcom.com>
@ilayaperumalg ilayaperumalg added this to the 2.0.0.M1 milestone Dec 2, 2025
@ilayaperumalg ilayaperumalg added for: backport-to-1.1.x bug Something isn't working labels Dec 2, 2025
@markpollack
Copy link
Member

merged

main: e065965
1.1.x: 8cc4ea4

@markpollack markpollack closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working for: backport-to-1.1.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TokenTextSplitter.split() splits small text into multiple chunks if there is no ./?/!/\n at the end.

2 participants